AITopics | violation ratio

Collaborating Authors

violation ratio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Neural Information Processing SystemsFeb-12-2026, 03:17:14 GMT

Large Language Models that are evaluated on multiple metrics.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Neural Information Processing SystemsOct-10-2025, 00:52:28 GMT

Large Language Models that are evaluated on multiple metrics.

dominance, experiment, violation ratio, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Rioux, Gabriel, Nitsure, Apoorva, Rigotti, Mattia, Greenewald, Kristjan, Mroueh, Youssef

arXiv.org Machine LearningJun-10-2024

Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little has been done in the multivariate scenario, wherein an agent has to decide between different multivariate outcomes. By exploiting a characterization of multivariate first stochastic dominance in terms of couplings, we introduce a statistic that assesses multivariate almost stochastic dominance under the framework of Optimal Transport with a smooth cost. Further, we introduce an entropic regularization of this statistic, and establish a central limit theorem (CLT) and consistency of the bootstrap procedure for the empirical statistic. Armed with this CLT, we propose a hypothesis testing framework as well as an efficient implementation using the Sinkhorn algorithm. We showcase our method in comparing and benchmarking Large Language Models that are evaluated on multiple metrics. Our multivariate stochastic dominance test allows us to capture the dependencies between the metrics in order to make an informed and statistically significant decision on the relative performance of the models.

central limit theorem, dominance, optimal potential, (14 more...)

arXiv.org Machine Learning

2406.06425

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Learning the irreversible progression trajectory of Alzheimer's disease

Wang, Yipei, He, Bing, Risacher, Shannon, Saykin, Andrew, Yan, Jingwen, Wang, Xiaoqian

arXiv.org Artificial IntelligenceMar-9-2024

Alzheimer's disease (AD) is a progressive and irreversible brain disorder that unfolds over the course of 30 years. Therefore, it is critical to capture the disease progression in an early stage such that intervention can be applied before the onset of symptoms. Machine learning (ML) models have been shown effective in predicting the onset of AD. Yet for subjects with follow-up visits, existing techniques for AD classification only aim for accurate group assignment, where the monotonically increasing risk across follow-up visits is usually ignored. Resulted fluctuating risk scores across visits violate the irreversibility of AD, hampering the trustworthiness of models and also providing little value to understanding the disease progression. To address this issue, we propose a novel regularization approach to predict AD longitudinally. Our technique aims to maintain the expected monotonicity of increasing disease risk during progression while preserving expressiveness. Specifically, we introduce a monotonicity constraint that encourages the model to predict disease risk in a consistent and ordered manner across follow-up visits. We evaluate our method using the longitudinal structural MRI and amyloid-PET imaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our model outperforms existing techniques in capturing the progressiveness of disease risk, and at the same time preserves prediction accuracy.

alzheimer, progression, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2403.06087

Country:

North America > United States > Indiana (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Risk Assessment and Statistical Significance in the Age of Foundation Models

Nitsure, Apoorva, Mroueh, Youssef, Rigotti, Mattia, Greenewald, Kristjan, Belgodere, Brian, Yurochkin, Mikhail, Navratil, Jiri, Melnyk, Igor, Ross, Jerret

arXiv.org Machine LearningJan-9-2024

Foundation models such as large language models (LLMs) have shown remarkable capabilities redefining the field of artificial intelligence. At the same time, they present pressing and challenging socio-technical risks regarding the trustworthiness of their outputs and their alignment with human values and ethics [Bommasani et al., 2021]. Evaluating LLMs is therefore a multi-dimensional problem, where those risks are assessed across diverse tasks and domains [Chang et al., 2023]. In order to quantify these risks, Liang et al. [2022], Wang et al. [2023], Huang et al. [2023] proposed benchmarks of automatic metrics for probing the trustworthiness of LLMs. These metrics include accuracy, robustness, fairness, toxicity of the outputs, etc. Human evaluation benchmarks can be even more nuanced, and are often employed when tasks surpass the scope of standard metrics. Notable benchmarks based on human and automatic evaluations include, among others, Chatbot Arena [Zheng et al., 2023], HELM [Bommasani et al., 2023], MosaicML's Eval, Open LLM Leaderboard [Wolf, 2023], and BIG-bench [Srivastava et al., 2022], each catering to specific evaluation areas such as chatbot performance, knowledge assessment, and domain-specific challenges. Traditional metrics, however, sometimes do not correlate well with human judgments.

machine learning, natural language, stochastic dominance, (14 more...)

arXiv.org Machine Learning

2310.07132

Genre: Research Report > Experimental Study (0.41)

Industry:

Information Technology > Security & Privacy (0.41)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback